{ "cells": [ { "cell_type": "markdown", "id": "45438f47", "metadata": {}, "source": [ "# Running a Hydrometeorological Ensemble Verification Pipeline with _veriflow_\n", "\n", "Forecast verification helps us evaluate the quality and behaviour of forecasts. This can include comparing forecasts with observations, assessing skill across lead times, examining forecast uncertainty, or checking whether forecast distributions are realistic under different hydrological conditions.\n", "\n", "_veriflow_ is a Python-based framework for running forecast verification workflows in a consistent and reproducible way. It uses a configuration-driven approach to load data, define verification experiments, compute verification metrics and statistical criteria, and return results. It can be used to verify both deterministic and probabilistic ensemble forecasts.\n", "\n", "## 1. What this notebook does\n", "\n", "This notebook focuses on the **workflow**, with an initial visualization of verification results:\n", "\n", "1. Define a verification configuration\n", "2. Run the _veriflow_ pipeline\n", "3. Inspect the returned results\n", "4. Explore the available data and diagnostics\n", "5. Visualize the results with the interactive app\n", "\n", "The aim of this notebook is to demonstrate usage of _veriflow_, not to provide a complete overview of forecast verification theory. \n", "\n", "A second notebook focuses on the **interpretation** of the data, verification results, and forecast quality.\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "554ed2d8", "metadata": {}, "source": [ "## 2. Dataset and case study\n", "\n", "This example uses hydrometeorological data for the River Rhine basin. We focus on **discharge** and compare three discharge forecast datasets. Each dataset consists of a 5-member ensemble reforecast. The discharge forecasts were generated by forcing the HBV hydrological model with raw or post-processed ECMWF ensemble reforecasts of precipitation and temperature.\n", "\n", "The forecast datasets are:\n", "- **`raw_raw`**: baseline streamflow forecast driven by raw meteorological ensemble forecasts\n", "- **`lin_log`**: streamflow forecast driven by meteorological ensembles post-processed using a linear-log transformation\n", "- **`qqt_qqt`**: streamflow forecast driven by meteorological ensembles post-processed using a quantile-to-quantile transformation\n", "\n", "In this notebook, all forecast datasets are evaluated against the same observation dataset. We focus on a key location, **Lobith**.\n", "\n", "The data are read directly from a remote Zarr store, so no local download is required. For more information, see the References section." ] }, { "cell_type": "markdown", "id": "77f5494d", "metadata": {}, "source": [ "## 3. Imports and environment setup\n", "\n", "We start by importing the required Python libraries and _veriflow_ components.\n", "\n", "The imports include:\n", "\n", "- Required Python tools \n", "- _veriflow_ components, including configuration objects, Zarr datasource for reading the example data remotely, and pipeline runner\n", "- A local app for interactive visualization\n" ] }, { "cell_type": "code", "execution_count": null, "id": "63e60676", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The autoreload extension is already loaded. To reload it, use:\n", " %reload_ext autoreload\n" ] } ], "source": [ "# Add automatic reloading of modules in case of changes\n", "%load_ext autoreload\n", "%autoreload 2\n", "\n", "from datetime import datetime, timezone\n", "import logging\n", "import numpy as np\n", "import pandas as pd\n", "import xarray as xr\n", "import matplotlib.pyplot as plt\n", "from pathlib import Path\n", "from veriflow.configuration import Config, GeneralInfoConfig\n", "from veriflow.configuration.utils import (\n", " LeadTimes,\n", " Range,\n", " TimeUnits,\n", " VerificationPair,\n", " VerificationPeriod,\n", ")\n", "from veriflow.constants import DataType\n", "from veriflow.datasources import ZarrConfig\n", "from veriflow.pipeline import run_pipeline\n", "from veriflow.scores import CrpsForEnsembleConfig, RankHistogramConfig\n", "\n", "import sys\n", "\n", "sys.path.append(str(Path(\"..\").resolve()))\n", "from verification_plots import (\n", " crps_plot,\n", " find_crps_variable,\n", " rank_histogram_3d_plot,\n", " rank_histogram_plot,\n", " scatter_plot,\n", ")" ] }, { "cell_type": "markdown", "id": "1db18b58", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "id": "b97ec87c", "metadata": {}, "source": [ "## 4. Defining the verification configuration\n", "\n", "_veriflow_ is a configuration-driven workflow. The configuration tells the pipeline:\n", "\n", "- which data sources to read\n", "- which forecasts and observations to compare\n", "- which lead times to evaluate\n", "- which verification metrics and statistical criteria to compute\n", "\n", "We define these parts below and then run the full workflow with a single pipeline call." ] }, { "cell_type": "markdown", "id": "4ef8053a", "metadata": {}, "source": [ "### 4.1. General verification settings\n", "\n", "The general settings define the structure of the verification experiment:\n", "\n", "- the verification period\n", "- the forecast lead times\n", "- the verification pairs\n", "\n", "Each verification pair represents a single comparison between a forecast dataset and the observations." ] }, { "cell_type": "code", "execution_count": 3, "id": "4fd66a2a", "metadata": {}, "outputs": [], "source": [ "general = GeneralInfoConfig(\n", " # Evaluate forecasts issued during this period.\n", " verification_period=VerificationPeriod(\n", " start=datetime(1988, 1, 1, tzinfo=timezone.utc),\n", " end=datetime(2008, 12, 31, tzinfo=timezone.utc),\n", " dimension=\"forecast_reference_time\",\n", " ),\n", " # Define which forecast datasets are compared against observations.\n", " verification_pairs=[\n", " VerificationPair(id=\"raw_raw\", obs=\"obs\", sim=\"raw_raw\", variable=\"Q\"),\n", " VerificationPair(id=\"lin_log\", obs=\"obs\", sim=\"lin_log\", variable=\"Q\"),\n", " VerificationPair(id=\"qqt_qqt\", obs=\"obs\", sim=\"qqt_qqt\", variable=\"Q\"),\n", " ],\n", " # Evaluate lead times from 0 to 10 days.\n", " lead_times=LeadTimes(\n", " unit=TimeUnits.day,\n", " values=Range(start=0, end=10, step=1),\n", " ),\n", ")" ] }, { "cell_type": "markdown", "id": "ae4a493b", "metadata": {}, "source": [ "### 4.2. Data sources\n", "\n", "The example data is stored in remote Zarr datasets. Zarr is well suited for cloud-based, multidimensional data access.\n", "\n", "Here we define one observation source and three forecast sources. The source names must match the names used in the verification pairs above." ] }, { "cell_type": "code", "execution_count": 4, "id": "86b7f120", "metadata": {}, "outputs": [], "source": [ "datasources = [\n", " ZarrConfig(\n", " general=general,\n", " import_adapter=\"zarr\",\n", " source=\"obs\",\n", " data_type=DataType.observed_historical,\n", " consolidated=True,\n", " path=\"https://s3.deltares.nl/deltares-verification-assets/rhine_dataset_verkade_2013/obs_Q.zarr\",\n", " ),\n", " ZarrConfig(\n", " general=general,\n", " import_adapter=\"zarr\",\n", " source=\"raw_raw\",\n", " data_type=DataType.simulated_forecast_ensemble,\n", " consolidated=True,\n", " path=\"https://s3.deltares.nl/deltares-verification-assets/rhine_dataset_verkade_2013/case-raw-raw_Q.zarr\",\n", " ),\n", " ZarrConfig(\n", " general=general,\n", " import_adapter=\"zarr\",\n", " source=\"lin_log\",\n", " data_type=DataType.simulated_forecast_ensemble,\n", " consolidated=True,\n", " path=\"https://s3.deltares.nl/deltares-verification-assets/rhine_dataset_verkade_2013/case-lin-log_Q.zarr\",\n", " ),\n", " ZarrConfig(\n", " general=general,\n", " import_adapter=\"zarr\",\n", " source=\"qqt_qqt\",\n", " data_type=DataType.simulated_forecast_ensemble,\n", " consolidated=True,\n", " path=\"https://s3.deltares.nl/deltares-verification-assets/rhine_dataset_verkade_2013/case-qqt-qqt_Q.zarr\",\n", " ),\n", "]" ] }, { "cell_type": "markdown", "id": "fa1f3a23", "metadata": {}, "source": [ "### 4.3. Verification metrics and statistical criteria\n", "\n", "In this notebook, we compute two commonly used criteria for ensemble forecasts:\n", "\n", "**Continuous Ranked Probability Score (CRPS)** assesses how well an ensemble forecast matches the observed outcome. \n", "\n", "- A CRPS of **0** indicates a perfect forecast.\n", "- Lower values indicate better probabilistic forecast quality, indicating the ensemble is centred around the observation and does not unnecessarily spread out.\n", "- CRPS increases when the forecast distribution is biased, too wide, or too narrow relative to the observation.\n", "\n", "**Rank Histogram** assesses statistical reliability by showing where observations fall within the ensemble distribution.\n", "\n", "- A roughly uniform histogram indicates a well-calibrated ensemble.\n", "- Systematic patterns can reveal biases or issues with ensemble spread.\n", "\n", "The interactive app later in this notebook can be used to explore both reuslting CRPS and rank histogram." ] }, { "cell_type": "code", "execution_count": 5, "id": "20fede92", "metadata": {}, "outputs": [], "source": [ "scores = [\n", " # CRPS\n", " CrpsForEnsembleConfig(\n", " score_adapter=\"crps_for_ensemble\",\n", " general=general,\n", " reduce_dims=[],\n", " ),\n", " # Rank histogram\n", " RankHistogramConfig(\n", " score_adapter=\"rank_histogram\",\n", " general=general,\n", " reduce_dims=[\"forecast_reference_time\"],\n", " ),\n", "]\n", "\n", "config = Config(\n", " fileversion=\"0.1.0\",\n", " general=general,\n", " datasources=datasources,\n", " scores=scores,\n", ")" ] }, { "cell_type": "markdown", "id": "bc63e7a3", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "id": "38be2322", "metadata": {}, "source": [ "## 5. Running the verification pipeline\n", "\n", "With the configuration in place, we can run the _veriflow_ pipeline, which combines the following steps into a workflow:\n", "\n", "1. Load the input datasets\n", "2. Pair forecasts and observations\n", "3. Compute the requested verification metrics and statistical criteria\n", "4. Return everything in an `OutputDataset`" ] }, { "cell_type": "code", "execution_count": 6, "id": "39ea3d10", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2026-06-17 20:40:24,863 INFO veriflow.pipeline: Successfully initialized the configuration. \n", "\t verification_period_start = 1988-01-01 00:00:00 \n", "\t verification_period_end = 2008-12-31 00:00:00\n", "2026-06-17 20:40:24,865 INFO veriflow.pipeline: Start getting data from Zarr.\n", "2026-06-17 20:40:26,498 INFO veriflow.pipeline: Successfully got data from Zarr.\n", "2026-06-17 20:40:26,499 INFO veriflow.pipeline: Start getting data from Zarr.\n", "2026-06-17 20:40:27,018 INFO veriflow.pipeline: Successfully got data from Zarr.\n", "2026-06-17 20:40:27,019 INFO veriflow.pipeline: Start getting data from Zarr.\n", "2026-06-17 20:40:27,483 INFO veriflow.pipeline: Successfully got data from Zarr.\n", "2026-06-17 20:40:27,485 INFO veriflow.pipeline: Start getting data from Zarr.\n", "2026-06-17 20:40:27,956 INFO veriflow.pipeline: Successfully got data from Zarr.\n", "2026-06-17 20:40:27,960 INFO veriflow.pipeline: Successfully loaded all data from sources.\n", "2026-06-17 20:40:29,248 INFO veriflow.pipeline: Successfully computed CrpsForEnsemble for verification pair raw_raw.\n", "2026-06-17 20:40:30,363 INFO veriflow.pipeline: Successfully computed CrpsForEnsemble for verification pair lin_log.\n", "2026-06-17 20:40:31,397 INFO veriflow.pipeline: Successfully computed CrpsForEnsemble for verification pair qqt_qqt.\n", "2026-06-17 20:40:34,965 INFO veriflow.pipeline: Successfully computed RankHistogram for verification pair raw_raw.\n", "2026-06-17 20:40:37,746 INFO veriflow.pipeline: Successfully computed RankHistogram for verification pair lin_log.\n", "2026-06-17 20:40:40,612 INFO veriflow.pipeline: Successfully computed RankHistogram for verification pair qqt_qqt.\n", "2026-06-17 20:40:40,614 INFO veriflow.pipeline: Verification pipeline completed successfully.\n" ] } ], "source": [ "import logging\n", "\n", "logging.basicConfig(\n", " level=logging.INFO,\n", " format=\"%(asctime)s %(levelname)s %(name)s: %(message)s\",\n", " handlers=[logging.StreamHandler()],\n", ")\n", "\n", "output_dataset = run_pipeline(config)" ] }, { "cell_type": "markdown", "id": "55d77ea5", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "id": "89e3c9fa", "metadata": {}, "source": [ "## 6. Inspecting the `OutputDataset`\n", "\n", "The pipeline returns an **`OutputDataset`**, which provides access to the verification results. Results are organized by **verification pair**, where each pair defines a comparison between an observation dataset and a forecast dataset (for example, `raw_raw` versus observations). Each verification pair contains the input data together with the computed verification metrics.\n", "\n", "We start by inspecting the output structure and then extract the results for a single verification pair." ] }, { "cell_type": "code", "execution_count": 7, "id": "d3a0a267", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "veriflow.datamodel.main.OutputDataset" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(output_dataset)" ] }, { "cell_type": "code", "execution_count": 8, "id": "54c623e2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[VerificationPair(id='raw_raw', obs='obs', sim='raw_raw', variable='Q'),\n", " VerificationPair(id='lin_log', obs='obs', sim='lin_log', variable='Q'),\n", " VerificationPair(id='qqt_qqt', obs='obs', sim='qqt_qqt', variable='Q')]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "output_dataset.verification_pairs" ] }, { "cell_type": "markdown", "id": "3230a76b", "metadata": {}, "source": [ "The output contains three verification pairs, corresponding to the three forecast datasets defined in the configuration.\n" ] }, { "cell_type": "markdown", "id": "4ec863ab", "metadata": {}, "source": [ "We can now select one verification pair and inspect its dataset. The returned object is an `xarray.Dataset` containing:\n", "- the forecast data\n", "- the corresponding observations\n", "- the computed verification metrics and statistical criteria (in this case, CRPS and rank histogram)\n", "- metadata and coordinates (e.g., station, lead time, and forecast reference time)" ] }, { "cell_type": "code", "execution_count": 9, "id": "659a47b8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.Dataset> Size: 144MB\n",
"Dimensions: (station: 88, forecast_reference_time: 2924,\n",
" lead_time: 10, realization: 5, rank: 6)\n",
"Coordinates:\n",
" * station (station) <U11 4kB 'H-RN-0001' ... 'H-RN-WURZ'\n",
" lat (station) float64 704B 0.0 0.0 0.0 ... 0.0 0.0 0.0\n",
" lon (station) float64 704B 0.0 0.0 0.0 ... 0.0 0.0 0.0\n",
" * forecast_reference_time (forecast_reference_time) datetime64[ns] 23kB 19...\n",
" * lead_time (lead_time) timedelta64[ns] 80B 1 days ... 10 days\n",
" time (forecast_reference_time, lead_time) datetime64[ns] 234kB ...\n",
" * realization (realization) int64 40B 0 1 2 3 4\n",
" * rank (rank) float64 48B 1.0 2.0 3.0 4.0 5.0 6.0\n",
"Data variables:\n",
" obs (station, forecast_reference_time, lead_time) float64 21MB ...\n",
" raw_raw (forecast_reference_time, lead_time, station, realization) float64 103MB ...\n",
" crps_for_ensemble (forecast_reference_time, lead_time, station) float64 21MB ...\n",
" histogram_rank (station, lead_time, rank) float64 42kB 1.142e+0...\n",
"Attributes:\n",
" data_type: observed_historical\n",
" units: m^3/s